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ABSTRACT 


Image  models  are  very  useful  for  image  coding.-  compression ,  segmenta¬ 
tion,  interpretation  as  well  as  image  enchancement  and  restoration.  For 
many  images  in  practical  applications ,  statistical  information  is  most 
important.  This  report  deals  with  tne  fundamental  statistical  theorv  of 
image  models  including  the  tonics  of  contextual  analysis,  stochastic  random 
field,  the  local  and  global  properties  of  the  random  field,  ARiiA  systems, 
and  the  applications  of  the  statistical  image  models. 
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STATISTICAL  IMAGE  MODELING 


INTRODUCTION 

^  .  . 

The  way^a  digital  image  is  processed  depands^largely  on  how  it  is 

modeled.  Generally  speaking  image  models  are  useful  in  image  coding, 
compression,  interpretation,  classification,  texture  characterization  as 
well  as  image  enchancement  and  restoration.  Both  statistical  and  struc¬ 
tural  models  of  images  have  been  considered  as  the  images  contain  both 
statistical  and  structural  information^,  Purely  structural  models  are 
too  regular  to  be  interesting.^  In  most  practical  applications  especially 
in  the  defense  area,  the  statistical  information  is  most  important.  With 


these  applications  in  mind7*this  report  deals  with  the  fundamental  theore¬ 
tical  topics  in  statistical  image  modeling.  An  extensive  list  of  references 
is  provided  to  cover  many  publications  in  this  important  area. 


II.  THE  CONTEXTURAL  ANALYSIS 


Consider  a  digital  image  with  M  x  N  picture  elements,  i.e.,  M  rows 
and  N  columns.  A  simple  approximation  of  contextual  dependence  for  the 
two-dimensional  patterns  is  called  Markov  mesh  flj ,  which  is  considered 
as  a  two-dimensional  Markov  chain.  Assume  that  the  image  is  partitioned 
into  m  x  n  subimages.  Then  this  two-dimensional  Markov  chain  is  charac¬ 
terized  by  a  transition  probability  matrix  P  defined  as 
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whore  P„  *  P(x;.|xi)  is  the  transition  probability.  Here  x^x^  are  the 
vector  measurements  of  subimages.  It  can  be  shown  that  for  binary  ran¬ 
dom  patterns. 
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That  the  transition  probability  P  depends  only  on  the  transition  prob- 
abilities  of  the  neighboring  submiages  is  a  very  iir^ortant  result  in  con¬ 
textual  analysis.  The  above  result  may  be  generalized  to  grayscale  pic¬ 
tures.  In  general,  the  dependence  on  the  eight  neighboring  subimages  is 
most  important  in  contextual  analysis. 

The  use  of  neighbor  dependence  approximation  for  the  spatial  patterns 
was  .first : studied  by  Chow  [2],  The  tree  dependence  he  considered  [3]  can 
also  be  generalized  to- spatial  patterns;'  -In -this  case  each  subimage  will 
depend  on -certain  surrounding . sub images  in  addition  to  the  eight- neighboring 
subimages.  •  -For  image:- interpretation  or  classification;  the  compur.d  decision 
theory  provides  a  theoretical  framework  for  decision  making  using- the  con¬ 
textual  information.  However,  the  practical  implementation  of  the  compound 
decision  rule  has  been  limited  to  Markov  dependence  or  neighbor  dependence. 
Assume  that  a  sub-image  depends  only  on  its  four  adjacent  subimages  in  the 
east-west  and  north-south  directions ,  then  the  decision  rule  is  to  choose 
the  class  that  maximizes  the  likelihood  function  [4], 
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p(e)p(x'|e)  n  p(n.  |e)  o> 
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where  x  and  correspond  to  vector  measurements  of  the  subimage  under 
consideration  and  its  neighboring  subimage  respectively  and  8  *  i,  2,  ~,  n 
with  m  being  the  number  of  classes.  Here 
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where  9^  =  1,  2,  ...  m  and  PO^|0)  is  the  transition  probability  which  is 
usually  determined  experimentally.  If  the  four  subimages  in  the  four  cor¬ 
ners  are  also  considered,  then  the  likelihood  function  for  the  eight- 
neighbor  dependence  case  is 


P(9)  P(x|e)  n  p (x , | 9) 
i=l  1 


(5) 


where  P(x^j9)  is  also  given  by  Eq.  (4).  The  simple  result  given  by  Eqs. 

(3)  and  (5)  is  due  to  the  assumption  of  conditional  independence  between 
the  subimage  and  the  neighboring  subimages  and  the  assumption  that  the 
contribution  due  to  the  adjacent  subinages  is  independent  of  that  due  to 
subimages  in  four  corners.  Both  assumptions  are  reasonable  in  theory 
though  quite  restrictive  in  practice.  For  example,  the  occurrence  of  one 
class  at  one  subimage  will  affect  the  occurrences  of  its  neighboring 
classes.  Obviously,  the  results  are  not  valid  for  dependence  on  arbitrary 
set  of  sub images  in  the  neighborhood.  Another  problem  with  the  compound 
decision  rule  is  that  the  conditional  probability  densities  are  usually 
not  available  and  the  a  priori  information  may  not  be  accurate.  Experi¬ 
mental  results  based  on  Gaussian  assumption  of  probability  densities  have 
consistently  demonstrated  the  performance  improvement  with  the  use  of  con¬ 
textual  information  [4],  [5],  Empirically  it  is  possible  to  determine  these 
densities  from  the  histogram  of  gray  levels  of  each  subimage,  which  corres¬ 
ponds  to  an  unconditional  probability  density  [6], 

As  the  statistical  contextual  analysis  based  on  the  neighborhood  depend¬ 
ence  model  described  above  is  quite  restrictive  in  practice,  further  develop¬ 
ment  in  image  modeling  is  much  needed  for  contextual  analysis. 
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III.. STOCHASTIC  RANDOM  FIELD 


The  Markov  random  field  is  the  most  typical  assumption  in  statistical 
models.  Wong  [7]  considered  the  properties  of  a  two-dimensional  random 
field  having  finite  first  and  second  moments.  He  found  that  there  is  no 
continuous  Gaussian  random  field  of  two  dimensions  which  is  both  homogeneous 
and  Markov  of  degree  1.  A  homogeneous  random  field  has  a  covariance  function 
that  is  invariant  under  translation  as  well  as  rotation.  Woods  [8]  considered 
a  more  general  definition  of  Markov  mesh  than  that  discussed  in  the  last 
section.  Hassner  and  Sklansky  [9]  also  discussed  a  Markov  random  field  model 
for  images.  They  presented  an  algorithm  that  generates  a  texture  from  an 
initial  random  configuration  and  a  set  of  independent  parameters  that  specify 
a  consistent  collection  of  nearest  neighbor  conditional  probabilities  which 
characterize  the  Markov  random  field. 

For  many  practical  applications  such  as  in  military  area,  the  homogen¬ 
eous  random  field  assumption  is  not  valid  because  of  the  object  boundaries. 
Nahi  and  Jahanshashi  [10]  suggested  modelling  the  image  as  a  background  sta¬ 
tistical  process  combined  with  a  sat  of  foreground  statistical  processes, 
each  replacing  the  background  in  the  regions  occupied  by  the  objects  being 
considered.  Let 

b,  ,  =  gray  level  at  the  mth  row  and  nth  column, 

Y,  .  =  a  binary  function  carrying  the  boundary  information, 

(m,n) 

bfa  =  a  sample  gray  level  from  the  background  process, 

bQ  =  a  sample  gray  level  from  the  object  process,  and 

v  =  a  sample  gray  level  from  the  noise  process, 

then  the  model  can  be  written  as 

k(m,n)  ~  Y (m,n)bo (m,n)  +  ^  Y  (m,n) ^b (m,n)  +  v(m,n)  ^ 

where  Y  incorporates  the  assumption  of  first  order  Markov  process  on  the 
object  boundaries. 


Huang  modeled  image  scan  lines  as  a  Markov  jump  process  [11].  This 
model  led  to  non-linear  noise  reduction  and  image  segmentation  algorithms 
that  are  superior  to  linear  techniques.  The  recursive  calculation  of  a 
conditional  probability  involving  the  boundary  component  of  the  scan  lines 
was  the  key  to  the  non-linear  algorithms.  Modestino  [12]  modeled  the  image 
as  a  marked  point  process  evolving  according  to  a  spatial  parameter.  In 
another  approach  the  image  is  considered  as  a  spatially  variant  linear  sys¬ 
tem  superimposed  by  non-linear  elements  corresponding  to  object  boundaries. 
Ingle  and  Hoods  [13]  considered  the  use  of  a  bank  of  Kalman  filters  corres¬ 
ponding  to  various  correlation  directions  and  demonstrated  a  considerable 
improvement  in  the  visual  quality  compared  with  linear  constant  coefficient 
Kalman  filtering.  Chen  [14]  has  employed  an  adaptive  Kalman  filtering  that 
operates  a  generalized  likelihood  ratio  test  in  parallel  with  the  Kalman 
filter.  An  object  boundary  corresponds  to  a  state  jump  that  is  detected 
and  used  to  update  the  Kalman  filter.  It  appears  that  both  textural  and 
temporal  variations  can  be  properly  taken  into  account  in  the  image  enhance¬ 
ment. 

IV.  LOCAL  AND  GLOBAL  MODELS 

Local  statistical  image  models  emphasize  on  the  use  of  local  statistics 
while  global  models  attempt  a  description  of  the  random  field  by  using  the 
information  from  the  entire  field.  In  the  absence  of  any  knowledge  or  as 
assumption  about  the  global  process  underlying  a  given  image,  one  may  attempt 
to  describe  the  joint  probability  density  of  the  gray  level  or  other  proper¬ 
ties  of  the  picture  elements.  To  do  this  for  the  entire  image  involves 
extremely  high  dimensional  space  which  is  unrealistic.  It  would  be  easier 
to  consider  a  small  neighborhood.  However,  even  for  a  3  x  3  neighborhood, 
a  nonparametric  representation  in  a  9  dimensional  space  along  with  the  asso- 


r 


elated  sample  size  and  storage  problems  still  presents  serious  difficulty. 
Thus  even  for  local  models,  it  would  be  desirable  to  "compress"  the 


local  properties  to  a  lew  dimensional  space.  Local  description  of  co¬ 
occurrence  statistics  for  textures  (e.g.  [15])  uses  only  2x1  neighborhoods. 
Different  features  can  then  be  derived  frero  the  co-occurrence  matrix  for 
texture  classification. 

Most  of  the  local  models,  however,  use  conditional  probabilities  of  pic¬ 
ture  elements  within  a  window,  instead  of  their  joint  probability  distribu¬ 
tions  described  above.  The  Markov  dependence  assumption  will  make  a  picture 
element  depend  upon  its  neighbors.  Let  r  and  s  be  the  row  number  and  the 
column  number  associated  with  a  picture  element  x.  A  conditional  nearest- 
neighbor  model  is 

P  [x  /all  other  values] 
rs 

(7) 

*  P  [x  /x  ,  ,  x  , ,  ,  x  ,  ,  x  ] 

rs  r-l,s  r*l,s'  r,ss-i  r,s+l 

and  is  also  known  as  a  Markov  field.  An  efficient  procedure  to  take  into 
account  the  local  dependence  is  the  statistical  theory  of  nearest-neighbor 
systems  on  a  lattice  [ 16] .  If  we  consider  the  four  neighbors  in  the  east-west 
west  and  north-south  directions,  then  we  have  a  non-causal  model  given  by 

x  **-$-  (x  .  +  x  ,  )  +  B0(x_  ..v  +-x  ,  +  Y  (3) 

rs  1  r-1,8  r+l,s  2  r,  s+1)  r,s-l  rs 


where 


r  *  l,  2,  ....  M,  j  *  1,  2,  — ,  H;  8^ 


and  Sj  are  the  coefficients  to 


be  determined;  and  {¥  }  is  an  uncorrelated  Gaussian  noise  process  with  ■  ■ 

rs 

E(Y  ]  *  0,  var  [Y  ]  ■  c?,  i  •  1,  2,...,m  where  m  is  the  number  of  pattern 
rs  rs  1 

classes  under  consideration.  The  coefficients  8  and  8  may  also  differ 

1  2 

among  various  classes.  A  special  case  of  Eg.  (8)  is  the  causal  model  given 
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Each  class  corresponds  to  a  set  of  parameters  (y^,  0^,  0^) .  The  para¬ 
meter  values  are  given  or  estimated  for  each  class.  Classification  con¬ 
sists  in  deciding  which  of  the  given  sets  of  parameter  values  best  des¬ 
cribes,  in  terms  of  probabilities ,  the  image  to  be  classified.  It  is 
noted  that  the  model  given  by  Eq.  (9)  permits  discrimination  between 

classes  oi,  and  w.  even  whe.i  (y.  ,  a.)  =  (y . ,  a.)  so  long  as  8.^3..  That 
13  1133  13 

is  the  model  performs  classification  by  using  the  information  about  the 
inter- pixel  correlation  as  well  as  the  mean  and  standard  deviation  of 
gray  levels.  Both  Eqs.  (8)  and  (9)  represent  first  order  autoregressive 
model  where  the  autoregression  parameters  ^  describe  the  SDatial  correla 

tion.  An  alternate  expression  for  the  model  described  by  Eq.  (9)  is 


x 

rs 


a.  +  3.  [x  ,  +  x  ,]  +  Y 

1  i  r-l,s  r,s-l  rs 


a.  +  0.  Z  +  Y 
1  x  r ,  s  rs 


where  a.  =  (1-20. )u.,  and  Z  =  x  .  +  x  Here  a.  and  3.  can  be 

1  ix  r,s  r-lfS  r,s-l.  1  1 

estimated  from  ordinary  simple  linear  regression  by  the  method  of  least 
squares.  The  maximum  likelihood  decision  rule  can  be  used  to  classify 
xr^  which  follows  the  Gaussian  probability  with  all  parameters  represented 
by  their  least  squares  estimates. 


It  is  noted  that  one  difficulty  with  the  model  given  by  Eq.  (8)  is 

that,  even  if  the  Y  and  hence  the  x  are  Gaussian,  the  estimation  of 
rs  rs 

0^  and  0^  from  data  is  not  a  simple  least-scuares  problem,  because  the 

Jacobian  of  the  transformation  from  the  noise  variable  Y  to  the  obser- 

rs 

vation  x  is  difficult  to  evaluate.  If  the  density  function  of  the 
rs 

finite  Fourier  transform  of  x  is  considered,  then  the  Jacobian  of 

rs 

transformation  will  be  unity  [17].  Eqs.  (8)  and  (9)  can  be  generalized 
to  higher  order  models.  For  the  second  order  model,  the  oicture  elements 
identified  as  l's  and  2's  in  the  following  figure  should  be  included  in  the 


linear  model. 
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For  the  nth  order  model,  the  picture  elements  up  to  a  distance  of  n  should 
be  included  in  the  model.  The  expression  is  given  by 


n  n 

x  =  I  8.  (x  +  x  )  +  T  e;  (x  .  +  x  )  + 

rs  .  ,  l  r-l,s  r+l,s  “  ,  i  r,s-i  r,s+l 
i=l  i«l 


Y  (11) 
rs 


where  the  parameters  8^,  8  T  can  be  estimated  by  using  the  maximum  likelihood 
principle  [18]  if  only  casual  terms  are  used  in  Eq.  (11). 

For  the  global  models  considered,  the  Gaussian  model  is  an  oversim¬ 
plification  [19]  even  though  it  is  mathematically  tractable.  The  stationary 
Gaussian  assumption  requires  that  the  mean  vector 

of  gray  levels  be  a  vector  of  identical  components.  Hunt  [20]  suggested  to 
use  a  nonstationary  Gaussian  model  which  allows  the  mean  vector  to  have 
unequal  components. 

It  would  be  desirable  to  include  some  structural  property  in  the 
in  the  global  model.  Matheron  [21]  used  the  term  "regionalized  variables" 
to  emphasize  the  particular  features  of  the  picture  elements  whose  complex 
mutual  correlation  reflects  the  structure  of  the  underlying  phenomenon. 

He  assumed  weak  stationarity  of  the  increments  in  the  gray  levels  between 
picture  elements.  The  second  moment  of  the  increments  in  the  gray  levels 
between  picture  elements  at  an  arbitrary  distance,  called  the  variogram, 
is  used  to  reflect  the  structure  of  the  field.  Knowledge  of  the  variogram 
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is  useful  for  the  estimates  of  many  global  and  local  properties  of  the 
field.  A  characterization  similar  to  the  variogram  is  given  by  the  auto¬ 
correlation  function.  However  for  real  imagery  good  functional  forms  of 
variogram  and  autocorrelation  are  seldom  available.  A  reasonable  approxi¬ 
mation  must  be  sought  between  the  functional  form  cf  the  image  model  and 
the  real  data  considered. 

V.  THE  ARMA  SYSTEMS 

Eq.  (11)  is  the  autoregressive  model  in  its  general  form.  A  more 
general  parametric  model  is  called  the  autoregressive  moving  average  (ARMA) 
system  which  replaces  Y  ^  in  Eq.  (11)  by  a  finite  number  of  previous'  Y^ 
values.  Of  course  only  the  causal  terms  are  taken  in  Eq.  (11) .  The 
resulting  ARMA  or  mixed  model  can  provide  a  good  representation  of  the 
image  with  properly  chosen  coefficients.  However  an  autoregressive  model 
of  sufficient  order  should  be  adequate. 

To  determine  the  order  of  the  model,  a  maximum  likelihood  decision 
rule  can  be  used  for  choices  of  neighbors  [17].  A  simpler  procedure  is 
to  use  the  Akaike  Information  Criterion  (AIC)  in  each  of  the  two  dimensions 
and  thus  the  window  size  can  be  determined  [22]  by  the  final  prediction  error. 
Me  consider  two  lines  parallel  to  the  horizontal  and  vertical  axes  passinq 
a  point  (i,j),  and  employ  one-dimensional  estimators. 


i-j 

ll 

=  l 

e,M)  x.  . 

p=l 

i  1-0,3 

’  -i  • 

li 

6™  x.  . 

(12) 

i») 

q=l 

i  i«]-q 

where  M  and  N  are  the  window  sizes  in  the  horizontal  and  vertical  directions 

respectively.  Let  and  S ^  be  the  minimum  estimation  errors.  We  deter- 

min  min 
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mine  the  optimum  values  of  M  and  II  so  that  the  two  criterion  functions 


fpe  („, . ; » » +  i  s  “.■> 

I  -  M  -  1  mm 

FPE  (N)  -  S™ 

3  -  M  -  1  mm 

are  minimized.  Here  the  original  image  size  is  assumed  as  I  x  J.  A 
more  accurate  procedure  would  consider  the  two  dimensions  jointly  and 
may  lead  to  a  smaller  window  size  because  of  the  correlation  among  the 
adjacent  picture  elements.  Although,  the  AIC  may  lead  to  inconsistent 
result,  it  is  by  far  the  simplest  criterion  for  determining  the  window 
size. 


(13) 


The  coefficients  in  the  autoregressive  model  should  be  determined 
from  the  autocorrelation  function.  However  it  would  be  desirable  to 
develop  an  efficient  two-dimensional  Levinson  recursion  to  compute  the 
coefficients.  The  frequency  domain  analysis  and  the  maximum  entropy 
method  are  alternative  procedures  for  estimation  of  coefficients. 

The  study  of  ARM A  systems  for  image  analysis  is  still  at  its  infancy. 
Further  development  of  ARilA  systems  is  much  needed  for  texture  characteri¬ 
zation  and  object  boundary  extraction. 


VI.  APPLICATIONS  OF  STATISTICAL  IMAGE  MODELS 


In  the  ARI1A  systems,  images  are  described  by  a  few  coefficients  or 
parameters  which  may  be  coded  for  image  transmission  instead  of  transmitting 
the  '/hole  picture.  This  represents  an  important  approach  to  image  compres¬ 
sion.  Although  a  limited  amount  of  effort  has  been  made  so  far  in  this 
direction  [23]  ,  development  in  two-dimensional  ARMA  models  will  find  great 
applicability  in  image  transmission. 
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For  the  inage  interpretation  and  classification,  image  models  must 
emphasize  on  discrimination  information  such  as  the  local  statistics  as 
the  object  and  background  must  be  statistically  different.  Different 
objects  nust  have  different  model  parameters.  Although  nost  image  models 
are  designed  to  represent  or  characterize  an  image  rather  than  to  discri¬ 
minate  among  different  objects  or  classes,  useful  discrimination  informa¬ 
tion  is  contained  in  the  image  models. 

Because  of  a  large  variety  of  images,  many  different  image  models 
are  available.  Statistical  models  are  most  effective  for  images  which 
are  rich  in  texture.  The  image  processing  techniques  employed  are  often 
determined  by  image  modeling.  For  exanple,  Kalman  filtering  is  parti¬ 
cularly  suitable  for  images  modeled  as  a  spatially  variant  linear  system 
with  additive  noise.  For  images  modeled  with  a  multiplicative  disturbance, 
a  different  image  processing  technique  is  required.  On  the  other  hand, 
preprocessed  images  usually  make  image  modeling  easier.  Image  models  also 
provide  useful  knowledge  about  the  ir-'-.ge  for  image  segmentation  and  resto¬ 
ration. 

In  summary  image  modeling  in  general  and  statistical  image  modeling 
in  particular  provides  an  abstraction  of  rich  information  of  various  nature 
in  the  images  and  should  be  considered  as  an  integral  part  of  image  analysis 
and  synthesis. 
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